20 research outputs found

    J-MOD2^{2}: Joint Monocular Obstacle Detection and Depth Estimation

    Full text link
    In this work, we propose an end-to-end deep architecture that jointly learns to detect obstacles and estimate their depth for MAV flight applications. Most of the existing approaches either rely on Visual SLAM systems or on depth estimation models to build 3D maps and detect obstacles. However, for the task of avoiding obstacles this level of complexity is not required. Recent works have proposed multi task architectures to both perform scene understanding and depth estimation. We follow their track and propose a specific architecture to jointly estimate depth and obstacles, without the need to compute a global map, but maintaining compatibility with a global SLAM system if needed. The network architecture is devised to exploit the joint information of the obstacle detection task, that produces more reliable bounding boxes, with the depth estimation one, increasing the robustness of both to scenario changes. We call this architecture J-MOD2^{2}. We test the effectiveness of our approach with experiments on sequences with different appearance and focal lengths and compare it to SotA multi task methods that jointly perform semantic segmentation and depth estimation. In addition, we show the integration in a full system using a set of simulated navigation experiments where a MAV explores an unknown scenario and plans safe trajectories by using our detection model

    AgriSORT: A Simple Online Real-time Tracking-by-Detection framework for robotics in precision agriculture

    Full text link
    The problem of multi-object tracking (MOT) consists in detecting and tracking all the objects in a video sequence while keeping a unique identifier for each object. It is a challenging and fundamental problem for robotics. In precision agriculture the challenge of achieving a satisfactory solution is amplified by extreme camera motion, sudden illumination changes, and strong occlusions. Most modern trackers rely on the appearance of objects rather than motion for association, which can be ineffective when most targets are static objects with the same appearance, as in the agricultural case. To this end, on the trail of SORT [5], we propose AgriSORT, a simple, online, real-time tracking-by-detection pipeline for precision agriculture based only on motion information that allows for accurate and fast propagation of tracks between frames. The main focuses of AgriSORT are efficiency, flexibility, minimal dependencies, and ease of deployment on robotic platforms. We test the proposed pipeline on a novel MOT benchmark specifically tailored for the agricultural context, based on video sequences taken in a table grape vineyard, particularly challenging due to strong self-similarity and density of the instances. Both the code and the dataset are available for future comparisons.Comment: 8 pages, 5 figures, submitted to International Conference on Robotics and Automation (ICRA) 2024. Code and dataset will be soon available on my github. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

    Transferring knowledge across robots: A risk sensitive approach

    Get PDF
    One of the most impressive characteristics of human perception is its domain adaptation capability. Humans can recognize objects and places simply by transferring knowledge from their past experience. Inspired by that, current research in robotics is addressing a great challenge: building robots able to sense and interpret the surrounding world by reusing information previously collected, gathered by other robots or obtained from the web. But, how can a robot automatically understand what is useful among a large amount of information and perform knowledge transfer? In this paper we address the domain adaptation problem in the context of visual place recognition. We consider the scenario where a robot equipped with a monocular camera explores a new environment. In this situation traditional approaches based on supervised learning perform poorly, as no annotated data are provided in the new environment and the models learned from data collected in other places are inappropriate due to the large variability of visual information. To overcome these problems we introduce a novel transfer learning approach. With our algorithm the robot is given only some training data (annotated images collected in different environments by other robots) and is able to decide whether, and how much, this knowledge is useful in the current scenario. At the base of our approach there is a transfer risk measure which quantifies the similarity between the given and the new visual data. To improve the performance, we also extend our framework to take into account multiple visual cues. Our experiments on three publicly available datasets demonstrate the effectiveness of the proposed approach

    Fast robust monocular depth estimation for Obstacle Detection with fully convolutional networks

    No full text
    Obstacle Detection is a central problem for any robotic system, and critical for autonomous systems that travel at high speeds in unpredictable environment. This is often achieved through scene depth estimation, by various means. When fast motion is considered, the detection range must be longer enough to allow for safe avoidance and path planning. Current solutions often make assumption on the motion of the vehicle that limit their applicability, or work at very limited ranges due to intrinsic constraints. We propose a novel appearance-based Object Detection system that is able to detect obstacles at very long range and at a very high speed (∼ 300Hz), without making assumptions on the type of motion. We achieve these results using a Deep Neural Network approach trained on real and synthetic images and trading some depth accuracy for fast, robust and consistent operation.We show how photo-realistic synthetic images are able to solve the problem of training set dimension and variety typical of machine learning approaches, and how our system is robust to massive blurring of test images

    Weakly Supervised Fruit Counting for Yield Estimation Using Spatial Consistency

    No full text
    Fruit counting is a fundamental component for yield estimation applications. Most of the existing approaches address this problem by relying on fruit models (i.e., by using object detectors) or by explicitly learning to count. Despite the impressive results achieved by these approaches, all of them need strong supervision information during the training phase. In agricultural applications, manual labeling may require a huge effort or, in some cases, it could be impossible to acquire fine-grained ground truth labels. In this letter, we tackle this problem by proposing a weakly supervised framework that learns to count fruits without the need for task-specific supervision labels. In particular, we devise a novel convolutional neural network architecture that requires only a simple image level binary classifier to detect whether the image contains instances of the fruits or not and combines this information with image spatial consistency constraints. The result is an architecture that learns to count without task-specific labels (e.g., object bounding boxes or the multiplicity of fruit instances in the image). The experiments on three different varieties of fruits (i.e., olives, almonds, and apples) show that our approach reaches performances that are comparable with SotA approaches based on the supervised paradigm

    Visual-inertial Tracking on Android for Augmented Reality Applications

    No full text
    Augmented Reality (AR) aims to enhance a person’s vision of the real world with useful information about the surrounding environment. Amongst all the possible applications, AR systems can be very useful as visualization tools for structural and environmental monitoring. While the large majority of AR systems run on a laptop or on a head-mounted device, the advent of smartphones have created new opportunities. One of the most important functionality of an AR system is the ability of the device to self localize. This can be achieved through visual dometry, a very challenging task for smartphone. Indeed, on most of the available smartphone AR applications, self localization is achieved through GPS and/or inertial sensors. Hence, developing an AR system on a mobile phone also poses new challenges due to the limited amount of computational resources. In this paper we describe the development of a egomotion estimation algorithm for an Android smartphone. We also present an approach based on an Extended Kalman Filter for improving localization accuracy integrating the information from inertial sensors. The implemented solution achieves a localization accuracy comparable to the PC implementation while running on an Android device

    Evaluation of non-geometric methods for visual odometry

    No full text
    Visual Odometry (VO) is one of the fundamental building blocks of modern autonomous robot navigation and mapping. While most state-of-the-art techniques use geometrical methods for camera ego-motion estimation from optical flow vectors, in the last few years learning approaches have been proposed to solve this problem. These approaches are emerging and there is still much to explore. This work follows this track applying Kernel Machines to monocular visual ego-motion estimation. Unlike geometrical methods, learning-based approaches to monocular visual odometry allow issues like scale estimation and camera calibration to be overcome, assuming the availability of training data. While some previous works have proposed learning paradigms to VO, to our knowledge no extensive evaluation of applying kernelbased methods to Visual Odometry has been conducted. To fill this gap, in this work we consider publicly available datasets and perform several experiments in order to set a comparison baseline with traditional techniques. Experimental results show good performances of learning algorithms and set them as a solid alternative to the computationally intensive and complex to implement geometrical techniques
    corecore